9 resultados para Mate-pair sequencing

em DigitalCommons@The Texas Medical Center


Relevância:

30.00% 30.00%

Publicador:

Resumo:

My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND AND PURPOSE: Familial aggregation of intracranial aneurysms (IA) strongly suggests a genetic contribution to pathogenesis. However, genetic risk factors have yet to be defined. For families affected by aortic aneurysms, specific gene variants have been identified, many affecting the receptors to transforming growth factor-beta (TGF-beta). In recent work, we found that aortic and intracranial aneurysms may share a common genetic basis in some families. We hypothesized, therefore, that mutations in TGF-beta receptors might also play a role in IA pathogenesis. METHODS: To identify genetic variants in TGF-beta and its receptors, TGFB1, TGFBR1, TGFBR2, ACVR1, TGFBR3, and ENG were directly sequenced in 44 unrelated patients with familial IA. Novel variants were confirmed by restriction digestion analyses, and allele frequencies were analyzed in cases versus individuals without known intracranial disease. Similarly, allele frequencies of a subset of known SNPs in each gene were also analyzed for association with IA. RESULTS: No mutations were found in TGFB1, TGFBR1, TGFBR2, or ACVR1. Novel variants identified in ENG (p.A60E) and TGFBR3 (p.W112R) were not detected in at least 892 reference chromosomes. ENG p.A60E showed significant association with familial IA in case-control studies (P=0.0080). No association with IA could be found for any of the known polymorphisms tested. CONCLUSIONS: Mutations in TGF-beta receptor genes are not a major cause of IA. However, we identified rare variants in ENG and TGFBR3 that may be important for IA pathogenesis in a subset of families.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cmd4 is a colcemid-sensitive CHO cell line that is temperature sensitive for growth and expresses an altered $\beta$-tubulin, $\beta\sb1$. One revertant of this cell line, D2, exhibits a further alteration in $\beta\sb1$ resulting in an acidic shift in its isoelectric point and a decrease in its molecular weight to 40 kD, as measured by two dimensional gel electrophoresis. This $\beta$-tubulin variant has been shown to be assembly-defective and unstable. Characterization of the mutant $\beta\sb1$ in D2 by high pressure liquid chromatography (HPLC) revealed the loss of methionine containing tryptic peptides 7,8,9, and 10. Southern analysis of the genomic DNA digested with several different restriction enzymes resulted in the appearance of new restriction fragments 250 base pairs shorter than the corresponding fragments from the wild-type $\beta\sb1$-tubulin gene. Northern analysis on mRNA from D2 revealed two new message products that also differed by 250 bases from the corresponding wild type $\beta$-tubulin transcripts. To precisely define the region of the alteration, cloning and sequencing of the mutant and wild type genomic $\beta$-tubulin genes were conducted. A size-selected EcoRI genomic library was prepared using the Stratagene lambda Zap II phage cloning system. Using subclones of CHO $\beta$-tubulin cDNA as probes, a 2.5 kb wild type clone and a 2.3 kb mutant clone were identified from this library. Each of these was shown to contain a portion of the gene extending from intron 3 through the end of the coding sequence in exon 4 and into the 3$\sp\prime$ untranslated region on the basis of alignment with the published human $\beta$-tubulin sequence. Sequencing of the mutant 2.3 kb clone revealed that the mutation is due to a 246 base pair internal deletion in exon 4 (base pair 756-1001) that encodes amino acids 253-334. This deletion results in the loss of a putative binding site for GTP which could potentially explain the phenotype of this mutant $\beta$-tubulin. Also sequence comparison of the 3$\sp\prime$ untranslated region between different species revealed the conservation of 200 base pairs with 78% homology. It is proposed that this region could play an important role in the regulation of $\beta$-tubulin gene expression. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Every x-ray attenuation curve inherently contains all the information necessary to extract the complete energy spectrum of a beam. To date, attempts to obtain accurate spectral information from attenuation data have been inadequate.^ This investigation presents a mathematical pair model, grounded in physical reality by the Laplace Transformation, to describe the attenuation of a photon beam and the corresponding bremsstrahlung spectral distribution. In addition the Laplace model has been mathematically extended to include characteristic radiation in a physically meaningful way. A method to determine the fraction of characteristic radiation in any diagnostic x-ray beam was introduced for use with the extended model.^ This work has examined the reconstructive capability of the Laplace pair model for a photon beam range of from 50 kVp to 25 MV, using both theoretical and experimental methods.^ In the diagnostic region, excellent agreement between a wide variety of experimental spectra and those reconstructed with the Laplace model was obtained when the atomic composition of the attenuators was accurately known. The model successfully reproduced a 2 MV spectrum but demonstrated difficulty in accurately reconstructing orthovoltage and 6 MV spectra. The 25 MV spectrum was successfully reconstructed although poor agreement with the spectrum obtained by Levy was found.^ The analysis of errors, performed with diagnostic energy data, demonstrated the relative insensitivity of the model to typical experimental errors and confirmed that the model can be successfully used to theoretically derive accurate spectral information from experimental attenuation data. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A clone of the primary Eco R1 family of human DNA sequences has been used as an indicator sequence for detecting alterations induced by a toxic agent. Specific clones of this family have been examined and compared to the consensus sequence to determine the normal variability of this family. Though variations were observed, data indicated that such clones can be used to study induced DNA modifications. This DNA was exposed to the toxic agent dimethyl sulfate under various conditions and a distinct pattern of aberrations was shown to occur. It is suggested that this approach be used to characterize patterns of damage induced by various agents in the ultimate development of a system capable of monitoring human genotoxic exposure. ^

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Paracrine motogenic factors, including motility cytokines and extracellular matrix molecules secreted by normal cells, can stimulate metastatic cell invasion. For extracellular matrix molecules, both the intact molecules and the degradative products may exhibit these activities, which in some cases are not shared by the intact molecules. We found that human peritumoral and lung fibroblasts secrete motility-stimulating activity for several recently established human sarcoma cell strains. The motility of lung metastasis-derived human SYN-1 sarcoma cells was preferentially stimulated by human lung and peritumoral fibroblast motility-stimulating factors (FMSFs). FMSFs were nondialyzable, susceptible to trypsin, and sensitive to dithiothreitol. Cycloheximide inhibited accumulation of FMSF activity in conditioned medium; however, addition of cycloheximide to the migration assay did not significantly affect motility-stimulating activity. Purified hepatocyte growth factor/scatter factor (HGF/SF), rabbit anti-hHGF, and RT-PCR analysis of peritumoral and lung fibroblast HGF/SF mRNA expression indicated that FMSF activity was unrelated to HGF/SF. Partial purification of FMSF by gel exclusion chromatography revealed several peaks of activity, suggesting multiple FMSF molecules or complexes.^ We purified the fibroblast motility-stimulating factor from human lung fibroblast-conditioned medium to apparent homogeneity by sequential heparin affinity chromatography and DEAE anion exchange chromatography. Lysylendopeptidase C digestion of FMSF and sequencing of peptides purified by reverse phase HPLC after digestion identified it as an N-terminal fragment of human fibronectin. Purified FMSF stimulated predominantly chemotaxis but chemokinesis as well of SYN-1 sarcoma cells and was chemotactic for a variety of human sarcoma cells, including fibrosarcoma, leiomyosarcoma, liposarcoma, synovial sarcoma and neurofibrosarcoma cells. The motility-stimulating activity present in HLF-CM was completely eliminated by either neutralization or immunodepletion with a rabbit anti-human-fibronectin antibody, thus further confirming that the fibronectin fragment was the FMSF responsible for the motility stimulation of human soft tissue sarcoma cells. Since human soft tissue sarcomas have a distinctive hematogenous metastatic pattern (predominantly lung), FMSF may play a role in this process. ^